A Graph Based Approach to Speaker Retrieval in Talk Show Videos with Transcript-Based Supervision
نویسندگان
چکیده
This paper proposes a graph based strategy to retrieve frames containing the queried speakers in talk show videos. Based on who is speaking and when information from the audio transcript, an initial audio-based step, that restricts the queried person to frames corresponding to when he/she is speaking, with a second step that analyzes visual features of shots is combined. Specifically, based on the production property of talk show video, (1) Shot based graph is constructed first. Then the densest sub-graph is returned as the final result. But instead of direct search (DS) of the densest part, (2) We model the intra node connection and inter node connection by a frame layer degree map to take into account the duration information within each shot node; (3)A graph partition strategy without restriction on the shape and the number of subgraphs is proposed, in which shots containing the same person are more similar to each other. Experiments on one episode of the French talk show “Le Grand Echiquier” show more than 10% improvement to audio only method and more than 7.5% improvement to DS method on average.
منابع مشابه
A New Method for Characterization of Biological Particles in Microscopic Videos: Hypothesis Testing Based on a Combination of Stochastic Modeling and Graph Theory
Introduction Studying motility of biological objects is an important parameter in many biomedical processes. Therefore, automated analyzing methods via microscopic videos are becoming an important step in recent researches. Materials and Methods In the proposed method of this article, a hypothesis testing function is defined to separate biological particles from artifact and noise in captured v...
متن کاملA Novel Approach for Detecting Relationships in Social Networks Using Cellular Automata Based Graph Coloring
All the social networks can be modeled as a graph, where each roles as vertex and each relationroles as an edge. The graph can be show as G = [V;E], where V is the set of vertices and E is theset of edges. All social networks can be segmented to K groups, where there are members in eachgroup with same features. In each group each person knows other individuals and is in touch ...
متن کاملFuzzy retrieval of encrypted data by multi-purpose data-structures
The growing amount of information that has arisen from emerging technologies has caused organizations to face challenges in maintaining and managing their information. Expanding hardware, human resources, outsourcing data management, and maintenance an external organization in the form of cloud storage services, are two common approaches to overcome these challenges; The first approach costs of...
متن کاملمنطق گفتگو و غزل عرفانی
The logic of conversation is based on the speaker (I) , the hearer (you) and the referrent (He). It can take three forms: 1. the speaker talks to the listener about the referrent 2. The speaker talk about the listener to the listener 3. The speaker talks about himself to the listener. The present Paper elaborates on these issues and explains how this logic takes shape in mystical lyrics. The mo...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کامل